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Abstract 

We report on a scale determination with gradient-flow techniques on the Nf = 2 + 1 + 1 highly 
improved staggered quark ensembles generated by the MILC Collaboration. The ensembles include 
four lattice spacings, ranging from approximately 0.15 to 0.06 frn, and both physical and unphysical 
values of the quark masses. The scales y/to/a and wo/a and their tree-level improvements, yA+mp 
and u+impj are computed on each ensemble using Symanzik flow and the cloverleaf definition of 
the energy density E. Using a combination of continuum chiral-perturbation theory and a Taylor- 
series ansatz for the lattice-spacing and strong-coupling dependence, the results are simultaneously 
extrapolated to the continuum and interpolated to physical quark masses. We determine the 
scales y/to = 0.1416(1|) fm and wo = 0.1714(1]J) fm, where the errors are sums, in quadrature, of 
statistical and all systematic errors. The precision of wq and y/t$ is comparable to or more precise 
than the best previous estimates, respectively. We then find the continuum mass dependence of 
y/to and wo, which will be useful for estimating the scales of new ensembles. We also estimate the 
integrated autocorrelation length of ( E(t )}. For long flow times, the autocorrelation length of (E) 
appears to be comparable to that of the topological charge. 


* Present address: Department of Physics and Astronomy, University of Iowa, Iowa City, Iowa, 52242 USA 
i brownnathan@wustl.edu 
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I. INTRODUCTION 


Scale setting holds central importance in lattice QCD for two reasons. First, the contin¬ 
uum extrapolation of any quantity, dimensionful or dimensionless, requires a precise deter¬ 
mination of the relative scale between ensembles with different bare couplings. Second, the 
precision to which one may determine a dimensionful quantity in physical units is limited by 
the precision of the scale in physical units (the absolute scale). Because scale setting limits 
the precision of so many calculations, it is important to identify quantities with the highest 
level of precision to set the scale. 

To make progress towards this goal a thorough understanding of the restrictions on quan¬ 
tities that may be used for scale setting is required. In principle, any dimensionful quantity 
that is finite in the continuum limit may be employed. The relative scale may be set by cal¬ 
culating a dimensionful quantity and comparing its value in lattice units at different lattice 
spacings for the same quark masses. For absolute scale setting, one needs to compare the 
quantity in lattice units to the physical value. If the quantity is experimentally accessible 
the comparison to the physical value is straightforward. For a quantity that is inaccessible 
to experiments, its physical value in the continuum is inferred by comparison to an experi¬ 
mental quantity. In other words, an experimental quantity may be used directly for relative 
and absolute scale setting, but a quantity that is inaccessible to experiments requires the 
lattice measurement of a second, experimentally accessible quantity for absolute scale set¬ 
ting. The use of a nonexperimental quantity for scale setting may still be worthwhile if it 
can be determined on the lattice with small statistical and systematic errors for relatively 
small computational cost. This is due to the large gain in control over continuum extrap¬ 
olations at the cost of a small decrease in the precision of absolute scales. This has led to 
the consideration of theoretically motivated, but not experimentally measurable, quantities 
such as r 0 and rq [1, 2], F pis [3], and, more recently, [4] and w 0 [5] from gradient flow 
[6, 7], 

The ideal scale-setting quantity has small statistical and systematic errors. However, since 
systematic errors arise from a variety of sources, such as discretization effects, dependence on 
the simulation (possibly unphysical) quark masses, finite-volume effects, and excited states, 
it is difficult to reduce all error sources simultaneously. For example, the scales ro and r\ 
are computed from asymptotic fits in time t to the heavy-quark potential V (r) with quark 
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separation r, such that r 2 dV/dr = 1.65 or 1, for r = and r 4 , respectively [1, 2], The 
statistical errors in V(r) are generally small, but they grow with t/a and may become a 
problem at small lattice spacings where larger values of t/a are needed to reduce systematic 
errors from excited states [3]. As another example, consider F p4s , the fictitious pseudoscalar 
decay constant with degenerate valence quarks of mass m v = 0.4m s and physical sea-quark 
masses [3]. The value of the valence-quark mass is chosen to be heavy enough to make 
it not too expensive to compute the correlators, but light enough for chiral-perturbation 
theory to apply. However, F p4s has strong dependence on the valence-quark mass. Thus, 
relatively small errors in determining arn s , the physical value of the strange-quark mass in 
lattice units, may lead to significant errors in aF p4s through the value of the valence mass, 
am v = 0Aarn s . Further, the required asymptotic fits to correlators are difficult to automate 
and usually require significant human intervention. 

Gradient flow [6, 7] has received considerable attention [8-11] over the past few years 
because it is a theoretically grounded smoothing operation that is simple to implement and 
can be used to obtain precisely determined scales. The basis for scale setting with gradient 
flow is the determination of the flow time for which a dimensionless, precise, and easily 
computable quantity is smoothed to a predefined value. The original quantity proposed by 
Liischer, f 0 , is defined through the gauge field energy density [4], Most modifications focus 
on reducing discretization errors in the same underlying flow or observable [5, 8, 12, 13]. 
All of these scales can be easily computed to a statistical precision of 0.1% or less and 
have small quark-mass dependence. Finite-volume effects, the only remaining sources of 
systematic error for relative scale setting, may also be kept very small. 

Here, we present our computation of the gradient-flow scales i /to/a and wq/ a on the 
MILC, (2+l+l)-flavor, highly improved staggered quark (HISQ) ensembles [3, 14], The 
HISQ configurations used in this analysis cover lattice spacings from a ~ 0.15 to 0.06 
fm and include ensembles with physical, or heavier than physical, light-quark masses, and 
physical, or lighter than physical, strange-quark mass. The charm-quark mass is kept near 
its physical value. We perform a continuum extrapolation and interpolation to physical 
quark masses of WoF p4:S and \/toF pis to determine the two scales in physical units, using 
our previous determination of F p as in physical units [15]. We find y/t$ = 0.1416(%5) fm 
and wo = 0.1714(l]l^|) fm, where statistical and all systematic errors have been added in 
quadrature. 
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We start with a review of the relevant theoretical details, including the gradient-flow 
equation in Sec. II A, definitions of the scales f 0 and wq in Sec. IIB, chiral-perturbation 
theory for flow quantities in Sec. II Bl, and lattice-spacing dependence in Sec. IIB 2. The 
computational setup is described in Sec. Ill A. We discuss the raw lattice results in Sec. Ill B, 
include a brief comparison of the results for different ensemble-generation algorithms in 
Sec. Ill B 1, and estimate the integrated autocorrelation lengths in Sec. Ill B 2. Leading-order 
adjustments for charm-quark-mass mistiming are performed in Sec. Ill B 3, and a simple 
extrapolation to the continuum of the results on the physical-mass ensembles is presented 
in Sec. Ill B 4. Section III C then describes the quark-mass interpolation and continuum 
extrapolation. We present our results for w$ and a /to in physical units in Sec. IV A, and 
include comparisons with our earlier preliminary results. The continuum mass dependence of 
v^o and u>o is deduced from our fits in Sec. IV B and used to compare the scales determined 
from the gradient flow to those determined from F pis in Ref. [15]; knowing the continuum 
mass dependence will be useful in determining the scales of new ensembles. Section V 
compares our results to those of other collaborations, and tabulates the precision of various 
methods for relative scale setting. 

Preliminary versions of this analysis have been described in Refs. [16] and [17]. 


II. REVIEW OF GRADIENT FLOW 

This section summarizes the theoretical details of gradient flow from Refs. [4-7, 13, 18] 
that are relevant to the scale-setting analysis in later sections. 


A. Diffusion equation 

Gradient flow [6, 7] is a smoothing of the original gauge fields A towards stationary points 
of the action S. The new, smoothed gauge fields B{t) are functions of the “flow time” t and 
are updated according to the diffusionlike equation below, where go is the bare coupling. 

= m; = A ' GW ‘ • B “ {0) = A * • (1) 

d v :x = d„x + (b„, x], c\,„ = - a„B„ + bj . 

On the lattice, the Yang-Mills action is replaced by an appropriate discretized version. 
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( 2 ) 


The gauge link V at site i in direction ju is updated in time according to 


dV(t)i,, 

dtt 


= ~9 o' 


idS(V) 


dVi 






M°) = u,. f 


hi* 


The change of V(t) with flow time explicitly follows the steepest descent of the action with 
respect to the gauge field, with an additional factor of V l . tl in the lattice formulation to 
ensure gauge covariance. For more details on the SU(3)-valued derivative, see the Appendix 
of Ref. [4], 

As the flow time t increases, the gauge fields diffuse and short-distance lattice artifacts are 
removed. After modifying the flow equation with a flow-time-dependent gauge transforma¬ 
tion of the field one can explicitly see the suppression of high momenta in the leading-order 
perturbative expansion of the gauge field in powers of the coupling g 0 [4]: 


Bn(x,t) 


(4 nty 


, B„(p, i) ss AMe^ 


( 3 ) 


B. Gradient-flow scales 


The process of gradient flow introduces a dimensionful, independent variable, the flow 
time. Since all quantities calculated from smoothed gauge links will be functions of the flow 
time, one may define a scale by choosing a reference time at which a chosen dimensionless 
quantity reaches a predefined value. If the dimensionless quantity is also finite in the con¬ 
tinuum limit, then the reference time scale will be independent of the lattice spacing up 
to discretization corrections in powers of a 2 . One of the easiest dimensionless quantities 
to calculate with only gauge fields is the average total energy within a smoothed volume 

V oc t 2 . Up to a dimensionless constant, this is equivalent to calculating the product of 
the energy density and squared flow time t 2 (E(t)). Luscher and Weisz have shown that the 
energy density is finite to all orders (when expressed in terms of renormalized quantities) 
[19], so t 2 (E(t )) is a suitable candidate for setting the scale. A fiducial point c is chosen, 
and the reference scale is defined to be the flow time to where 

tl(E(to))=c. (4) 

The fiducial point should be chosen so that for simulated lattice spacings a and volumes 

V = L 3 T (with T > L), the reference time scale to falls within a -C \/8to aL. The value 
of c = 0.3 has been found, empirically, to satisfy this relation [4, 5]. A larger fiducial point 
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of c = 2/3 has also been proposed in order to reduce discretization errors, at the expense of 
somewhat larger f ini te-vol um e effects [8]. 

The renormalized expansion of (. E(t )) to second order in g shows t 2 {E(t )) is approxi¬ 
mately constant [4], For small flow times this agrees with computational results, but for 
larger flow times (including the scale to) t 2 {E(t)) is found empirically to be linear in t [4, 5]. 
The transition of ( E(t )) from t~ 2 to f _1 dependence is nonperturbative. However, we ex¬ 
pect discretization errors to enter primarily for small flow times, before the lattice details 
are smoothed away. In accordance with this expectation, empirical evidence suggests that 
discretization effects have less impact on the slope of t 2 (E(t)) at comparatively larger flow 
times near the fiducial point, than they do on t 2 (E(t )) itself [5]. Assuming the property is 
general, an improvement to the scale to is computed by considering the slope: 


4 /W)) 


(5) 


where w o is the improved scale. Again, the value of the fiducial point c = 0.3 or c = 2/3 is 
chosen to avoid discretization and finite-volume effects. 


1. Chiral-perturbation theory 


Because both scales i Q and w 0 are defined in terms of the energy density (. E(t )), and the 
energy density is a local, gauge-invariant quantity, chiral-perturbation theory can be applied 
to determine the quark-mass dependence of the scales. This is an advantage over some other 
scales, such as ro or r±, for which no chiral-perturbation theory expansion is available. The 
mapping of ( E(t )) to the chiral effective theory has been carried out by Bar and Golterman 
in Ref. [18]. The expansion for y^ in the Nf = 2 + 1 case in terms of the pion and kaon 
mass is 


v^o — y/t 0 ,ch 


1 + 


, 2 M 2 K + hl 

1 (4vr ff 


———= ( (3k 2 — + 7;k\{M 2 — AMk)h v + k-zM 2 

(47i7r V 4 

, (2M 2 +M 2 ) 2 , , (Mk — M 2 ) 21 
+ ki tj-ju + k 5 —— 


Ev 


( 6 ) 


where to, c h is the value of to in the chiral limit, the chiral logarithms are represented with the 
shorthand = (Mq/Att f) 2 log (Mq//x) 2 , and the ki are low-energy constants (LECs) that 
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depend on the flow time. Note that chiral logarithms enter only at next-to-next-to-leading 
order (NNLO). The scale w 0 has the same expansion form to NNLO, but with different 
coefficients fcj. This is because the flow-time dependence of (. E(t )) appears only in the 
LECs, allowing the differences between Eqs. (4) and (5) to be absorbed into redefinitions of 
the LECs. 

One can generalize Eq. (6) to staggered chiral-perturbation theory in order to explicitly 
take into account discretization effects from staggered taste-symmetry violations. In this 
paper, however, we have used simple polynomial expansions to parametrize lattice-spacing 
effects. There are two reasons for this choice. First, the quark-mass dependence of the 
gradient-flow scales is already small, as will be evident in Sec. IV B, and nontrivial staggered 
effects would come in only with the chiral logarithms, which are of NNLO. For HISQ quarks, 
such effects are very small. Second, the number of undetermined coefficients in staggered 
chiral-perturbation theory expansions would be too large in comparison to the number of 
independent data points available for interpolations. Unlike analyses of pseudoscalar masses 
or decay constants, here we have no valence quarks whose masses could be varied to increase 
the size of the data set. 


2. Discretization effects 

In determining the scales to and wo, lattice artifacts enter in three places: the action 
used to generate the initial configurations, the action of the gradient flow, and the choice of 
observable. Because ensemble generation is expensive, the action chosen for generating the 
gauge configurations is fixed in practice. Therefore, we only consider improvements to the 
gradient flow and energy density. 

Empirical results suggest partial improvements of the flow or the energy density can yield 
smaller 0(a 2 ) terms. By using the tree-level improved Symanzik action instead of the Wilson 
action in the flow, the BMW Collaboration found smaller cutoff effects for both gradient- 
flow scales on their Wilson-clover ensembles with 2-HEX smearing (with scale set by Mq) 
[5]. Similarly, using the symmetric, cloverleaf definition of the field-strength tensor G in 
(E) = GpnGuv/ 4, instead of the simpler sum over the plaquettes, yielded cutoff effects in 
y/to/ro that were five times smaller [4], Of course, applying partial improvements at different 
steps is not guaranteed to produce smaller cutoff effects in the final result. Also, for each 


case, the lattice-spacing dependence of the gradient-flow scale cannot be cleanly separated 
in the numerical results from the dependence of the additional quantity used to set the scale 
in the extrapolation to the continuum. 

A detailed examination of the discretization effects on gradient-flow scales has been re¬ 
cently carried out in Ref. [13]. The net lattice-spacing dependence from all three stages of 
the calculation (dynamical action, flow, and observable) is determined at tree level in the 
gauge coupling from a calculation of (E(t)) at finite lattice spacing. For the clover observable 
chosen in this study 

F(t) = (t 2 E(t)) = 3(A ]],~) )g ° (1 + CW/t + 0(a 4 /t 2 ) + O( jo 2 )) , (7) 

C 2 = 2c f + -Cg - — , (8) 

where the coefficient Cf describes the gradient-flow action, and c g describes the original gauge 
action used to generate the ensembles [13]. For our choices of Symanzik one-loop-improved 
gauge action ( c g = —1/12 at tree level) and Symanzik tree-level gradient flow (c/ = —1/12), 
we have C 2 = —19/72. Unfortunately, our choices of actions and observable lead to larger 
tree-level discretization terms than from many other combinations of common choices of 
action for the flow and observable. For more details see Table 1 in Ref. [13]. 

Utilizing the known a 2 dependence of F(t), “improved” scales are defined in Ref. [13] by 
canceling the tree-level contributions to F(t) in the implicit definitions of t 0 and Wq\ 

' _ F 

(1 + C 2 {o?jt ) 

" d t 2 

dt (1 + C 2 (a 2 /t ) 

For clarity, we will use to.orig and tco,ori g from here on to refer to the original definition of 
t 0 and Wq, and reserve the notations to and Wo to refer generically to both the original and 
improved versions, or to discuss their continuum limits, which is of course common to both 
versions. The tree-level improvement in Eqs. (9) and (10) is not obviously an improvement 
in the nonperturbative region of flow-time where the scales are determined. However, the 
tree-level improvement may be worthwhile if discretization errors arise predominantly from 
the small-f region, as observed by BMW [5]. We compute the improved scales to,imp and 
wo,imp and compare it to the a 2 dependence of the original scales in Sec. Ill B. 

An additional theoretical handle on the comparison can be made by expanding the original 
scales directly as a power series in a 2 and calculating the coefficients. The lattice-spacing 


(m) 


+ C 4 (a 2 /f ) 2 + ...) 

(E(t)) 


= c . 


t=to\ 


+ C 4 (a 2 /t ) 2 + ...) 


= c 


(9) 

( 10 ) 


t=wi ■ 


0,imp 
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dependence of the gradient-flow scales are proportional to C 2 and depend on the continuum 
flow-time dependence of F(t) and its derivatives F'(t) = tj^F(t) and F"(t ) = t 2 ^F{t) 
evaluated at the corresponding continuum scale t = t 0 or t = w^. The next-to-leading-order 
coefficients are given by 

Vori E =*c.(l-+)]), r 2 = C 2 T «. -0.3568(2) , ( 11 ) 

“0,o,i, = < (1 - W 2 ^j , W 2 = « 0.070(2) , (12) 

Note that the coefficients T 2 and W 2 are identical to those derived for the improved scale 
in Ref. [13]; however, the a 2 coefficients in the above expression are —T 2 and — W 2 because 
Eq. (11) relates the (unimproved) scales at finite lattice spacing to the continuum scales. The 
numerical evaluation of F, F 1 , and F" for the estimates of T 2 and W 2 has been performed 
on the a ~ 0.06 fm, physical quark-mass ensemble (see Table I). No systematic errors are 
included in these estimates. Unfortunately, because ^/t 0 orig and u+orig are defined at flow 
times outside the perturbative regime, the systematic error on T 2 and W 2 from higher order 
and nonperturbative contributions to our estimates of F, F', and F" is not known. Since 
discretization errors for to,orig appear to enter primarily at short flow times [5], nonpertur¬ 
bative contributions to T 2 may well be small. However, there is no corresponding evidence 
to support a similar conclusion for W 2 . This is discussed further in Sec. Ill C. 

III. DETAILS OF THE COMPUTATION 

We compute the scales v^o.orig/a, w 0iO ri g /a, y/t 0pmp /a, and w 0)imp /a on the MILC Nf = 
2 + 1 + 1 HISQ ensembles [3, 14]. Before describing the gradient-flow simulation details, we 
tabulate the properties of the ensembles and those quantities needed from prior analyses. 
Tables I and II list the parameters and relevant observables for ensembles with the strange 
sea-quark mass tuned near its physical value, and well below its physical value, respectively. 
Table III gives the values of aF p4s at physical quark masses and associated lattice spacings, 
which are needed for continuum extrapolations. The lattice spacings are calculated with 
a mass-independent scale-setting scheme; the continuum value F p4s = 153.90(9)(l 2 g)MeV 
is taken from Ref. [15], where f n was used to set the absolute scale. Physical values of 
am s at each lattice spacing [15] are also tabulated. Using the physical quark-mass ratio 
m c /m s = 11.747(19) (11) [15], these values of am s determine values of the physical charm- 
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TABLE I. HISQ ensembles with near-physical strange sea-quark mass. The first three columns list 
the gauge coupling constant, the approximate lattice spacing, and the ratio of light- to strange-sea- 
quark mass. The fourth and fifth columns list the strange and charm sea-quark masses, respectively. 
(Quark masses with primes indicate simulation values of the ensemble, whereas unprimed masses 
indicate physical values.) All but two ensembles can be uniquely identified by the second and 
third columns. To differentiate between the two a ~ 0.12 fm, m!,/m' s = 1/10 ensembles we use the 
dimensions of the lattice, N 3 x A/, given in column 6. The last two columns give the taste-Goldstone 
pion and kaon masses in lattice units. 


p 

« a(frn) 

m'Jm'g 

am' s 

am' c 

Nf x N t 

aM n 

ciMk 

5.80 

0.15 

1/5 

0.0650 

0.838 

16 3 X 48 

0.23653(22) 

0.40261(25) 

5.80 

0.15 

1/10 

0.0640 

0.828 

24 3 x 48 

0.16614(10) 

0.38067(16) 

5.80 

0.15 

1/27 

0.0647 

0.831 

32 s x 48 

0.10180(09) 

0.37093(16) 

6.00 

0.12 

1/5 

0.0509 

0.635 

24 3 x 64 

0.18917(15) 

0.32358(20) 

6.00 

0.12 

1/10 

0.0507 

0.628 

32 s x 64 

0.13424(09) 

0.30813(15) 

6.00 

0.12 

1/10 

0.0507 

0.628 

40 3 x 64 

0.13400(06) 

0.30821(09) 

6.00 

0.12 

1/27 

0.0507 

0.628 

48 3 x 64 

0.08153(04) 

0.29851(11) 

6.30 

0.09 

1/5 

0.0370 

0.440 

32 s x 96 

0.14055(17) 

0.24061(18) 

6.30 

0.09 

1/10 

0.0363 

0.430 

48 3 x 96 

0.09852(08) 

0.22688(12) 

6.30 

0.09 

1/27 

0.0363 

0.432 

64 3 x 96 

0.57215(04) 

0.21946(09) 

6.72 

0.06 

1/5 

0.0240 

0.286 

48 3 x 144 

0.09438(16) 

0.16191(16) 

6.72 

0.06 

1/10 

0.0240 

0.286 

64 3 x 144 

0.06713(06) 

0.15452(09) 

6.72 

0.06 

1/27 

0.0220 

0.260 

96 3 x 192 

0.03887(03) 

0.14269(06) 


quark mass for each ensemble in lattice units, which in turn will be used to adjust for 
mistunings of the charm sea-quark mass in Sec. Ill B 3. Finally, Tabic III lists the effective 
coupling constant a s calculated from taste violations of the HISQ pions in Ref. [15]. The 
couplings are scaled by a constant so that a s = ay(q* = 1.5/a) for /3 = 5.8, where ay 
is determined from the plaquette [3, 20]. The values of a s are used below in continuum 
extrapolations. 
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TABLE II. HISQ ensembles with a lighter-than-physical strange sea-quark mass. All ensembles 
have gauge coupling constant /3 = 6.00 and lattice spacing a « 0.12 fm. The first two columns list 
the approximate values of the light sea-quark mass m[ and strange sea-quark mass m! s in units of 
the physical strange-quark mass m s . All of the ensembles may be uniquely identified by these two 
columns. The remaining columns are equivalent to those in Table I. 


~ m'Jrris ? 

« m' s /m s 

am' c 

IV 3 x N t 

aM n 

aMx 

0.10 

0.10 

0.628 

32 3 x 64 

0.13181(10) 

0.13181(10) 

0.10 

0.25 

0.628 

32 3 x 64 

0.13250(09) 

0.17385(11) 

0.10 

0.45 

0.628 

32 3 x 64 

0.13275(10) 

0.21719(12) 

0.10 

0.60 

0.628 

32 3 x 64 

0.13324(10) 

0.24509(13) 

0.175 

0.45 

0.628 

32 3 x 64 

0.17491(10) 

0.23199(12) 

0.20 

0.60 

0.635 

24 3 x 64 

0.18850(17) 

0.26382(18) 

0.25 

0.25 

0.640 

24 3 x 64 

0.20903(19) 

0.20903(19) 


A. Computational setup 


We solve the gradient-flow differential equation numerically using the Runga-Kutta al¬ 
gorithm generalized to SU(3) matrices, as originally proposed by Luscher [4]. The routine 
discretizes the flow time with a step size e and computes the gauge configuration at a later 
flow time t = ne by iterating from the initial gauge configuration. The total error of the 
integration up to flow time t scales like e 3 . For all ensembles analyzed in this paper, we 
find that the scales y/t 0jOl[g /a and u> 0 ,ori g /a determined at a step size of e = 0.07 cannot 
be differentiated, within statistical errors, from those at e = 0.03. We therefore consider 
e = 0.03 to be a conservative step size, and employ it for all results presented below. 

Both the Wilson and Symanzik tree-level actions for the gradient flow have been im¬ 
plemented and are publicly available in the current release of the MILC code [21], This 
computation uses the Symanzik tree-level improved action in the gradient flow and the 
symmetric, cloverleaf definition of the field-strength tensor G /tiy in (E) = G^G^/A. 
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TABLE III. Values of am s , aF p 4 S , a (in femtometers), and a s adjusted to physical values of the 
quark masses, for various couplings (5. All results are from the analysis presented in Ref. [15]. The 
first two columns list the gauge coupling and approximate lattice spacing. The next two columns 
list the strange mass and F p 4 S in lattice units. The lattice spacing from F p ^ s = 153.90(9) (l^g) 
MeV, in a mass-independent scheme, is listed in the fifth column. The final column tabulates the 
strong coupling constant a s determined from the taste splittings (see the text). For am s and a, 
the errors are the sum in quadrature of statistical and systematic errors. Only statistical errors 
are shown for aF p ^ s . 


p 

a(fm) 

am s 

nFp4s 

a(fm) 

CXs 

5.80 

6.00 

6.30 

6.72 

0.15 

0.12 

0.09 

0.06 

0.06863(1^) 

0.05304(1^) 

0.03631(+ 2 ?) 
0.02182(1}]]) 

0.119376(71) 

0.095403(56) 

0.068570(38) 

0.044237(25) 

0.15305(1;}}) 

0.12232(1*1) 

0.08791(1^) 

0.05672(1^) 

0.58801 

0.53796 

0.43356 

0.29985 


B. Measurements of gradient-flow scales 

Tables IV and V show the results for y/t 0t ori g/a, u^orig /a, \ft 0pmp /a, and w 0)imp /a on the 
HISQ ensembles. The scales a/£ o,im P /a and wo,im P /a were improved to 0(a 8 ) at tree level 
using Eqs. (9) and (10) and the coefficients calculated in Ref. [13] for Symanzik-Symanzik- 
Clover. For the ensembles with the smallest lattice volumes, all configurations are included 
in the computation. As the volumes and cost become larger, a fraction of the configurations 
are run. The configurations in each subset are spaced uniformly across the ensembles, with 
spacings chosen to help reduce autocorrelations. The total number of generated configu¬ 
rations, number of configurations in the gradient-flow calculation, and molecular-dynamics 
time separation between the included configurations are also tabulated for each ensemble in 
Tables IV and V. 

The error shown with each scale is statistical. It is determined by performing a jackknife 
analysis over the included subset of configurations in each ensemble. The jackknife bin size 
is set to be at least twice the integrated autocorrelation length of the energy density, which 
is determined in Sec. IIIB2. In many cases the bin size is larger than would be naively 
estimated by increasing the bin size until the statistical error plateaus, which is further 
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TABLE IV. Values of the gradient-flow scales on the physical strange-quark HISQ ensembles 
listed in Table I. The first two columns are the approximate lattice spacing and the ratio of light 
to strange sea-quark masses, with the lattice dimensions appended as needed to identify each 
ensemble uniquely. The next column shows the ratio of the number of configurations included in 
the gradient-flow calculation to the number of configurations in the ensemble. The fourth column 
lists the molecular-dynamics time separation t between configurations included in the gradient-flow 
calculation. Multiple values are listed for cases where independent streams of the same ensemble 
did not use the same r. 


cs a(fm) 

m'Jm's 

Nsim / Ngen 

T 

\Ao,orig/ a 

^0,orig/ ® 

\/to,imp/ O, 

'^0,imp/ Oi 

0.15 

1/5 

1020/1020 

5 

1.1004(05) 

1.1221(08) 

0.9857(04) 

1.1069(10) 

0.15 

1/10 

1000/1000 

5 

1.1092(03) 

1.1381(05) 

0.9932(02) 

1.1258(06) 

0.15 

1/27 

999/1000 

5 

1.1136(02) 

1.1468(04) 

0.9969(02) 

1.1361(04) 

0.12 

1/5 

1040/1040 

5 

1.3124(06) 

1.3835(10) 

1.2003(05) 

1.3870(11) 

0.12 

1/10 (32 3 x 

64) 999/1000 

5 

1.3228(04) 

1.4047(09) 

1.2100(04) 

1.4096(09) 

0.12 

1/10 (40 3 x 

64) 1000/1028 

5 

1.3226(03) 

1.4041(06) 

1.2098(03) 

1.4089(06) 

0.12 

1/27 

34/999 

140 

1.3285(05) 

1.4168(10) 

1.2152(05) 

1.4225(11) 

0.09 

1/5 

102/1011 

50,60 

1.7227(08) 

1.8957(15) 

1.6280(08) 

1.9053(16) 

0.09 

1/10 

119/1000 

36 

1.7376(05) 

1.9299(12) 

1.6423(05) 

1.9406(12) 

0.09 

1/27 

67/1031 

32,48 

1.7435(05) 

1.9470(13) 

1.6478(05) 

1.9583(13) 

0.06 

1/5 

127/1016 

48 

2.5314(13) 

2.8956(33) 

2.4618(12) 

2.9049(33) 

0.06 

1/10 

38/1166 

96 

2.5510(14) 

2.9478(31) 

2.4810(14) 

2.9582(30) 

0.06 

1/27 

49/583 

48 

2.5833(07) 

3.0119(19) 

2.5133(07) 

3.0223(19) 


evidence for the conservative nature of our estimates of autocorrelation lengths. 

Considering the low cost and the ease of computation, we originally intended to analyze all 
configurations from the HISQ ensembles. However, the desired statistical accuracy is often 
reached well before an entire ensemble is analyzed, and the cost, although low compared to 
configuration generation, is significant enough that analyzing all configurations would be an 
inefficient use of resources at present. If higher-precision scales are needed in the future, it 
would be straightforward to complete the analysis on the full ensembles. 
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TABLE V. Values of the gradient-flow scales on the HISQ lighter-than-physical strange-quark 
ensembles listed in Table II. The first two columns are identical to those in Table II and are used 
to identify the ensembles. The latter six columns are equivalent to those in Table IV. 



m' s /m s 

Nsim /gen 

r 

\/to,orig/ O 

U’OjOrig/ d 

\Z^0,imp/ CL 

^Opmp/ CL 

0.10 

0.10 

102/1020 

20 

1.3596(06) 

1.4833(13) 

1.2441(06) 

1.4932(13) 

0.10 

0.25 

204/1020 

20 

1.3528(04) 

1.4676(10) 

1.2378(04) 

1.4764(10) 

0.10 

0.45 

205/1020 

20 

1.3438(05) 

1.4470(10) 

1.2296(05) 

1.4544(11) 

0.10 

0.60 

107/1020 

20 

1.3384(08) 

1.4351(16) 

1.2247(07) 

1.4418(17) 

0.175 

0.45 

133/1020 

20 

1.3385(05) 

1.4349(13) 

1.2248(05) 

1.4415(14) 

0.20 

0.60 

255/1020 

20 

1.3297(06) 

1.4170(12) 

1.2166(06) 

1.4225(12) 

0.25 

0.25 

255/1020 

20 

1.3374(07) 

1.4336(14) 

1.2236(06) 

1.4402(15) 


1. Comparison of RHMC and RHMD 

As discussed in Ref. [3], two generation algorithms were employed for the HISQ en¬ 
sembles: rational hybrid Monte Carlo (RHMC) and molecular dynamics (RHMD). As a 
check of the consistency of these two algorithms, we compute the ratio of wo computed on 
RHMC-generated configurations divided by w 0 computed on RHMD-generated configura¬ 
tions for the same bare gauge coupling and quark masses. For a ~ 0.09 fm, m'Jm' s ~ 1/27, 
the ratio is Wq HMC /wq HMD = 1.0009(12). For a ~ 0.06 fm, rn[/rn' s « 1/10, the ratio is 
W RHMC jyjRHMD _ 1.0002(26). For some configuration streams the pattern of fluctuations of 
Wo/a with molecular-dynamics time is not sufficient to reliably estimate the mean and stan¬ 
dard deviation over that single stream. However, in the particular cases used for calculating 
the ratio, this issue is not evident. Figure 1 shows the fluctuations of the relevant streams 
for each ratio. For all fluctuations of Wo/a on a single stream, the length of the fluctuation 
in molecular-dynamics time units is small compared to the entire molecular-dynamics time 
span of the stream. 


2. Autocorrelation lengths 

The autocorrelation function T°(r) of an observable O is defined as the normalized 
correlation function of O with itself after the elapse of molecular-dynamics time r. Given the 
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FIG. 1. The scale wo/a measured on individual configurations as a function of the simulation 
time in molecular-dynamics time units. Configuration streams generated with RHMC and RHMD 
are represented by solid-red and dashed-blue lines, respectively. 
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number of configurations N, the ensemble average of the observable 6 , and a measurement 
of the observable O t on configuration i, T °(r) is calculated by 

T ° (r) = ^5} C ° (r) = VV7 £ (°< - <5) (0.+, - <5) . (13) 

The integrated autocorrelation length l is the integral of the autocorrelation function for all 
cases where r > 0. This integral is often estimated as a finite sum using the trapezoidal rule 
and a cutoff r < r cut , as shown in Ref. [22]: 

/ OO 1 Jcut 

T 0 (r)<irs S - + ^T 0 (r). (14) 

T= 0 

The cutoff r cut is justihed because the autocorrelation function typically decays to 0 as a 
function of r while the statistical noise increases with r. 

To compute statistical errors in the autocorrelation function and integrated autocorre¬ 
lation length, two independent methods are employed: jackknife and the approximations 
outlined by Madras and Sokal in Ref. [22], For the jackknife method, the ensemble’s con¬ 
figurations are divided into distinct bins of b adjacent configurations, and the jth jackknife 
subensemble is defined to be the set of all configurations not contained in bin j. The 
autocorrelation function T f(r) and the integrated autocorrelation length lj for the jth 
subensemble is computed exactly as for the entire ensemble, except that any contributions 
involving a configuration from the bin in question are dropped from the sum, and the factor 
of N — t is decreased accordingly. Finally, the sample variance for T c:i and l is estimated by 
measuring the variance over the set of jackknife subensembles. Defining the number of jack¬ 
knife subensembles to be M = N/b, the variance in any quantity x that can be calculated 
on individual configurations is, from standard jackknife analysis, 


cr 


2 

X 


M -l 
M 


M 

-x) 2 . 

3 = 1 


(15) 


In applying Eq. (15) to the autocorrelation function, which cannot be estimated from an 
individual configuration, we neglect complications from pairs of configurations between dif¬ 
ferent bins. This leads to corrections to Eq. (15) of 0(r/b), which can be neglected if the 
bin size is chosen to be large enough that b r cut . Intuitively this makes sense because, for 
sufficiently large sample and bin sizes, the jackknife calculation is similar to breaking one 
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large experiment into several smaller, mostly independent experiments. As long as the au¬ 
tocorrelation function can still be computed over its entire domain within any of the smaller 
experiments, the analogy still holds. 

As an additional check that the standard jackknife formulas apply, we used the Metropolis- 
Hastings algorithm to generate independent streams of Gaussian-distributed real numbers 
at fixed stream size N. By independently varying the stream size and the number of inde¬ 
pendent streams within a set, we verified that for a large N and b > r cut both the jackknife 
procedure and the equations of Madras and Sokal yield approximations of the sample vari¬ 
ance of T° and l that agree, within statistical errors, with each other and with the true 
variance of T° and l. 

The second method we employ to estimate the statistical error in T c ° and l is the one 
developed in Ref. [22]. 1 The approximations to the variance neglect 0(l/N ) effects and, 
for T°, are slower to compute than the estimates from jackknife. However, the jackknife 
procedure relics on finding a bin size such that l b N , which can be tricky for small 
samples with large correlations. In the end, we decide to employ and compare both methods 
because each will introduce different errors as the sample size decreases and the correlation 
increases. 

We compute the autocorrelation function of (E(t, r)) as a function of the flow time t and 
the number, r, of molecular-dynamics time units separating configurations. Figure 2 shows 
examples of the autocorrelation function of (E(t, r)) at fixed flow time t ~ for ensembles 
at a ~ 0.12 and 0.09 fm. For the ensembles at a ~ 0.15 and 0.12 fm, where the full ensembles 
have been analyzed, we have a reliable estimate of the statistical error of the autocorrelation 
function for all values of r. For the finer lattice spacings a ~ 0.09, 0.06 fm, estimating the 
autocorrelation functions for r smaller than the separations listed in Tables IV and V is 
impossible without calculating the gradient flow on more configurations. To address this, 
we have analyzed an additional 50 equilibrated configurations separated with r = 24 from 
the m[/m s = 1/10, a = 0.09 ensemble. Most of these configurations are not included in the 
calculation of the gradient-flow scales; we keep the configurations used for computing the 
scales uniformly spread over each ensemble, with constant separation in r. With our limited 
statistics on the a ~ 0.06 fm ensembles, we are unable to get useful information on /, and 
1 Note, that, unlike Madras and Sokal, we call the lag in simulation (molecular-dynamics) time r, rather 
that t, which is used here for flow time. 
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we therefore drop those ensembles from further consideration in this subsection. 

Once the autocorrelation function of (. E(t )) is computed, we integrate the function over 
the separation r for each step in flow time t. The statistical error in Z is then estimated 
either using jackknife or the formulas from Madras and Sokal. For the coarser a ~ 0.15 and 
0.12 fin ensembles, where b can be chosen to be bigger than r cut , we find the two estimates 
agree well with each other (as implied in the top plot of Fig. 2). We choose to use the 
jackknife estimate, which can be computed more rapidly. To ensure the bin size used in 
the jackknife procedure is sufficiently large, we first use a bin size b large enough that the 
statistical error in ( E{t,r )) is (approximately) unchanged with further increases in bin size. 
After determining a value for Z and a total error cq, we then repeat the calculation with the 
smallest bin size that obeys b > 21 and evenly divides the sample size. If the new central 
value and error estimate leads to values of Z that do not satisfy this condition, then the bin 
size is further increased, and this procedure is repeated until the condition is met. For the 
finer a ~ 0.09 ensemble, a bin size cannot be chosen that falls well between r cut and the 
sample size. So, we choose to use the method of Ref. [22] which yields slightly larger errors 
(by about 20%). 

After calculating the statistical error in Z, the bias from introducing r cut must also be 
accounted for. Here we use a slight elaboration on the automatic windowing algorithm 
mentioned in Ref. [22]: r cut is selected to be the lowest value possible that satisfies r cut > 
cZ(r cu t) for an appropriate choice of c. Once c is chosen, the remaining bias is approximately 
equal to T 0 (r cut )Z(r cu t). In Ref. [22] a value of c = 10 was empirically found to yield an 
acceptable balance of statistical noise and bias; however, our samples are significantly noisier 
so a smaller value of c is appropriate. With this in mind, we use the following strategy to 
come up with our final choice of r cut . First, we identify the smallest value of c = c m i n where 
T°(r cu t) is consistent with zero within statistical error. We then choose the value of c within 
the range c m ; n < c < 2c m ; n that yields the highest Z. For the a ~ 0.15 and 0.12 fm ensembles, 
we fold C min = 4 and c = 8, because the estimates of the autocorrelation functions stay small 
but positive for a significant range of r even after they are first consistent with zero. For 
the a ~ 0.09 fm ensemble, we fold c min = 2 and c = 2. This is because the estimate of 
the autocorrelation function in this case is much noisier and happens to become negative 
(although consistent with 0) almost immediately after first reaching zero. 

The integrated autocorrelation lengths with statistical error and the estimated bias com- 
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t (M.D.) 


FIG. 2. The autocorrelation function of ( E(t )} plotted as a function of the separation between 
configurations r in molecular-dynamics time units. For both figures the flow time t is fixed near 
the value of u’q. The top plot is for the larger-volume ensemble with a ~ 0.12 fm, m[ = m' s / 10. For 
visibility, only every fifth point in r is shown. The bottom plot is for the a ~ 0.09 fm, m\ = m' s / 10 
ensemble. Estimates of the standard error using jackknife and the formulas in Ref. [22] are present 
for each data point, with the latter shifted slightly right for visibility. The lower number of analyzed 
configurations on the finer ensemble leads to a relatively larger statistical error and step size in 
r. In both plots the vertical dashed lines correspond to the smallest and largest values of r cut 
considered for each ensemble (see the text). 
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bined in quadrature are plotted in Fig. 3. Notice the autocorrelation length for (. E(t )) 
appears to asymptotically increase for increasing flow times, as expected for a smooth¬ 
ing operation. The central estimate of the integrated autocorrelation length at large flow 
times is 58 molecular-dynamics time units for the a ~ 0.09 fm, m' l = m' s / 10, physical 
strange-quark mass ensemble. In comparison, the integrated autocorrelation length of the 
topological charge appears to be roughly 40 molecular-dynamics time units for the a m 0.09 
fm, m] = m' s / 5, physical strange-quark mass ensembles [3]. This suggests the autocorrela¬ 
tion length for (. E(t )) at large flow times is comparable to the autocorrelation length of the 
topological charge. 


3. Charm-quark mass mistuning 

Mistunings of the charm-quark mass on our ensembles vary between 1% and 11%. It is 
therefore important to account for the corrections in the charm-quark mass to the quan¬ 
tities we consider. Heavy-quark effects on low-energy quantities come from effects on the 

/oN 

scale Aq^ d as well as higher-order, physical corrections in powers of l/m c . Applying only 

/oN 

the leading-order corrections from the effect on Aqc D is sufficient for first estimates. How¬ 
ever, the higher precision of the full continuum extrapolation and quark-mass interpolation 
requires us to account for the next-to-leading-order (NLO) contributions, namely the first 
power corrections in 1 jm c . Since the implementation of NLO contributions primarily en¬ 
ters in the full analysis, we defer most of the discussion until Sec. Ill C and focus here on 
leading-order effects from the scale Aq< 1 d . 

If a dimensionless ratio is made of any two quantities evaluated at the same charm- 

(3) 

quark mass and with the same dependence on Aq^ d , then this dependence will cancel in 
the ratio. However, low-energy quantities may also depend on the light-quark masses, which 

/qN 

means they may have different dependence on Aqq D , even close to the chiral limit. Thus, 
ratios of low-energy quantities may have leading dependence on m c from the leftover scale 
dependence. In this analysis we scale all the gradient-flow scales and the meson masses 
aM 1 r, ciMr- by the pseudoscalar decay constant aF pis , whose values, adjusted to physical 
sea-quark masses (including physical m c ), are given in Ref. [15]. Since, for small light-quark 
masses, F p 4 S is proportional to Aq^ d and the meson masses are proportional to (Aqq D ) 1//2 ; the 
meson masses must be adjusted to physical m c to eliminate the leading-order m c dependence 
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FIG. 3. The integrated autocorrelation length (in molecular-dynamics time units) as a function 
of flow time for ensembles with m'Jm' s = 0.1 and different lattice spacings. The thickness of 
the colored regions show the full range of the la errors, obtained by adding, in quadrature, the 
statistical error and bias due to r cu t- Dashed vertical lines denote the flow times that determine 
u>o, 0 rig on each ensemble where the color of the line matches the color of the shaded region. For early 
flow times on the a ~ 0.09 fm ensemble the integrated autocorrelation length is not plotted because 
the smallest step in r between configurations is insufficient to resolve the dominant contributions 
to the autocorrelation function. 

through Ag> D . The gradient-flow scales (in MeV) are also proportional to Aq^, d (with quite 
small sea-quark mass dependence); therefore, scaling by aF pis evaluated at the same charm- 

/o\ 

quark mass will cancel the leading-order m c dependence through Aq^ d . To make sure aF p4s 
and the gradient-flow scales are evaluated at the same charm-quark mass either aF p 4 S has to 
be readjusted back to the simulation value m' c or the gradient-flow scales have to be adjusted 
to physical m c . In this case, we choose to readjust aF p4:S to m' c , since its derivative with 
respect to m' c , dF 2 /dm c = 0.00554(85) in pJ^s units, has already been computed from the 
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TABLE VI. Results for adjusted meson masses and aF p 4 S , on the physical strange-quark ensembles 
listed in Tables I and IV. The adjustments correct for leading-order charm-mass effects, as explained 
in the text. The first two columns are the approximate lattice spacing and ratio of light- to strange- 
sea-quark mass, with the lattice dimensions appended as needed to uniquely identify each ensemble. 
The next two columns list the masses aM. w and aMx adjusted to the physical charm-quark mass, 
with the associated statistical error in parentheses and the change from the data before leading- 
order charm-quark mass adjustment in square brackets. The last column lists the decay constant 
aF p ^ s (with statistical error) adjusted back to the simulation value am ' c , while the physical values 
for the two lighter quark masses are held fixed. 


« a (fm) 

m'i/m' s 

aM n 

uMk 

aFp^ 

0.15 

1/5 

0.23619(22)[—34] 

0.40204(25) [-57] 

0.11976(7) 

0.15 

1/10 

0.16598(10)[—16] 

0.38030(16)[—37] 

0.11964(7) 

0.15 

1/27 

0.10169(09)[—11] 

0.37051 (16) [—41] 

0.11967(7) 

0.12 

1/5 

0.18904(15)[—13] 

0.32335(20) [-22] 

0.09555(6) 

0.12 

1/10 (32 3 x 

:64) 0.13420(09) [-04] 

0.30804(15)[—09] 

0.09546(6) 

0.12 

1/10 (40 3 x 

64) 0.13396(06) [-04] 

0.30812(09) [-09] 

0.09546(6) 

0.12 

1/27 

0.08151(04) [-02] 

0.29843(11)[—08] 

0.09546(6) 

0.09 

1/5 

0.14039(17)[—16] 

0.24033(18) [-27] 

0.06874(4) 

0.09 

1/10 

0.09849(08) [-03] 

0.22681(12)[—07] 

0.06861(4) 

0.09 

1/27 

0.05719(04) [-03] 

0.21936(09) [-10] 

0.06864(4) 

0.06 

1/5 

0.09400(16) [-38] 

0.16125(16) [—65] 

0.04465(3) 

0.06 

1/10 

0.06686(06) [-27] 

0.15390(09) [-62] 

0.04465(3) 

0.06 

1/27 

0.03885(03) [-02] 

0.14262(06)[—07] 

0.04429(3) 


lattice data in Ref. [15]. For the ratios of aM n and aM K to aF p4s , we keep the physical-m c 
values of aF p4s from Ref. [15] and adjust aM. n and aMx to the physical value of m c , using 
the derivative dM 2 /dm c = 0.0209(41) calculated in pjs units [15]. The values of aM 7r , aMx , 
and aF P 4 S after charm-quark-mass adjustments are listed in Tables VI and VII. 

It is instructive to compare these results to the leading-order m c effect on Aq^ d ex¬ 
pected from perturbation theory. Defining the ratio P(m c / Aq<I d ) = Aqq D /Aq^ d , a 
renormalization-group invariant r](m c ) can be constructed from the logarithmic derivative 
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TABLE VII. Results for adjusted meson masses and the decay constant aF p 4 S , on the lighter-than- 
physical strange-quark ensembles listed in Tables II and V. The adjustments correct for charm-mass 
mistunings, as explained in the text. The first two columns are identical to those inTable II and 
are used to identify the ensembles. The latter three columns are equivalent to those in Table VI. 


m'Jrris 

m' s /m s 

aM n 

aMx 


0.10 

0.10 

0.13177(10) [—04] 

0.13177 (10) [—04] 

0.09546(6) 

0.10 

0.25 

0.13247(09) [-04] 

0.17380(11) [—05] 

0.09546(6) 

0.10 

0.45 

0.13271(10)[—04] 

0.21713(12) [—06] 

0.09546(6) 

0.10 

0.60 

0.13320(10)[—04] 

0.24502(13)[—07] 

0.09546(6) 

0.175 

0.45 

0.17487(10)[—05] 

0.23192(12) [—07] 

0.09546(6) 

0.20 

0.60 

0.18837(17) [—13] 

0.26364(18) [-18] 

0.09555(6) 

0.25 

0.25 

0.20883(19) [—21] 

0.20883(19)[—21] 

0.09561(6) 


of P [23] 


V(m c ) = 


m r 


P' 


A (4 ) P ’ 

iv QCD 


(16) 


with P' being the derivative of P with respect to its argument. At leading perturbative 
order, 77 ~ ijo = 2/21 [24, 25], and 


dm c 



9 A (3) 

Z iv QCD 

27 m c 


(17) 


Then, given Q = /c(Aq^ d ) p , where k and p are independent of m c , the partial derivative of 
Q with respect to m c at leading order in perturbation theory (and neglecting physical, NLO 
corrections in 1 /m c ) is 


— = kv (a (3) Y 1 . 

dm c V Q CD y dm c 27 m c 


(18) 


As mentioned in Ref. [15], the results of this formula for dF 2 /dm c and DM 2 jdnn c agree, 
within 10 %, with the numerical determination of these derivatives from our lattice data. 
Also, we find that the dimensionless product of aF P 4 S with the gradient-flow scales is ap¬ 
proximately the same whether aF p 4 S is readjusted to m' c using the numerically estimated 
derivative or the gradient-flow scales are adjusted to the physical value of m c using Eq. (18). 
The largest difference between the two approaches is 1.7cr stat , where <7 sta t is the statistical 
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error, and occurs on the a ~ 0.06 fin, m! l = m ' s /5 ensemble for the dimensionless combination 

V^0,orig Ap4s ■ 

We account for the remaining physical, NLO corrections in powers of 1 jm c by directly 
including such terms in the fits to a/^o, origins, Wo,orig-Fp 4 S , and w 0jimp F pis . Spe- 

cihc details of what powers are included and how the terms are constrained is detailed in 
Sec. Ill C. The effects of the NLO charm-mass corrections to the meson masses aM n /aF p4:S 
and aMx/aFp^ on the gradient-flow scales are negligible because these are quite small cor¬ 
rections and the dependence of the gradient-flow scales on aM n /aF p4:S and aMx/aF p4s is 
already weak. 

4- Simple continuum extrapolation 

A simple continuum extrapolation can be quickly performed by including only the physical 
quark-mass ensembles. With just these ensembles, light-quark, strange-quark, and NLO 
charm-quark-mass mistiming effects cannot be accounted for, and the statistical error will 
be larger than from a fit to the complete data set. Nevertheless, this extrapolation is useful 
because it provides a check on the final value from the more complicated fits and highlights 
the degree of improvement in discretization errors of Wo )OV i g F p4:S over ^t 0tOI i g F P 4 S , as well as 
V^o.imp F pAs and w apmp F p4s over the originals y/t^~F p4s and w 0 ,origF p4 , s . 

To perform the continuum extrapolation we multiply by the values of aF p 4 S listed in 
Table III to create a dimensionless quantity that is finite in the continuum limit. We choose 
aF P 4 s to keep the statistical errors smaller than what they would be from an experimentally 
accessible quantity such as f w . To convert the final result to physical units, however, we 
must use F pAs = 153.90(09)(log) MeV, which was computed with the scale set by af n . The 
advantage of using aF p4s to set the intermediate scale is that it yields smaller relative scale 
errors from different ensembles, and thus aids in the extrapolation to the continuum. 

Plots of to or \gF p 4 S and w 0 ^ orig F p 4 S as a function of a 2 are shown in Fig. 4. The discretiza¬ 
tion improvement of Wo,orig over is immediately evident in the differences between 

the coarsest and finest ensembles. This result holds for many choices of the reference scale, 
including af n , r i/a, ^/t 0pmp /a, and Wo,im P /a, in addition to aF P 4 S , the choice used in Fig. 4. 
In addition, the plot shows that the a 2 dependence is not trivial for WqF P 4 S . This is not 
unexpected because we are using a highly improved configuration action (which directly 
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FIG. 4. Simple continuum extrapolations for y / to,origFp 4 , s and WQ, or \ g F p ^ s over physical quark-mass 
ensembles only. Statistical error bars are present, but they are nearly invisible on this scale. Three 
fits to each data set are shown. The red, dot-dashed line is a linear fit in a 2 to the three finer 
ensembles (a < 0.15 frn), the blue dashed line is a linear fit in a? to all four ensembles, and the 
green solid line is a quadratic fit in a 2 to all four ensembles. The continuum extrapolation points, 
calculated from -y/to.imp and rco,i m p> are shown in magenta with error bars representing the sum of 
statistical and systematic uncertainties in quadrature. 
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FIG. 5. Simple continuum extrapolations for the original (\/to,orig and wo, 0 rig) and improved 
(V^o.imp and wo,imp) gradient-flow scale times F p over only physical quark-mass ensembles. 
Quadratic fits in a 2 or a s a 2 over all four physical-mass ensembles are shown for the original and 
improved scales, respectively. The continuum extrapolation points, calculated from the improved 
scales, are shown in black with error bars representing the sum of statistical and systematic uncer¬ 
tainties in quadrature. 

affects aF P 4 S ) for a statistically precise measurement. The importance of higher-order terms 
in a 2 and a s a 2 can be seen directly in the differences between the improved and original wq, 
as well as the difference between ^/to.orig and wo,ori g - The situation is further complicated 
by effects of quark-mass mistunings between ensembles with approximately the same ratio 
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mi/m s . This is explored in more detail in the full fit analysis in Sec. Ill C 2. For now, we 
include linear fits in a 2 with or without the coarsest a ~ 0.15 fm ensemble and quadratic 
fits in a 2 to all four ensembles. 

Figure 5 compares the improved scales y/io,imp and u>o,imp with the original ones y/t 0iOrig 
and wo,ori g - As before, we consider linear fits with or without the coarsest (a ~ 0.15 fm) 
ensemble, and quadratic fits with all four ensembles. For the unimproved scales the fit 
curves are functions of a 2 ; for the improved scales they are functions of a s a 2 , since tree-level 
discretization errors have been removed. The improvement at tree level is clear for y/f^, 
where the a s a 2 dependence of y/tgimp is close to linear, and the slope is considerably less 
steep than for y/t 0 , O rig- The difference between Wo.orig and u^imp is much smaller, and is 
contaminated here by mistiming effects, so we postpone discussion until after we correct for 
such mistunings. 

The continuum values are extracted from the quadratic fit in a s a 2 to the full data set on 
the improved scales. The systematic error from the extrapolation is estimated by the largest 
differences between this fit and the other fits considered. This yields the simple estimates 
for the gradient-flow scales \JTq = 0.1419(2)(1 1 4) fm and w 0 = 0.1710(4)(i t i 1 2) fm. Here we 
do not include errors (statistical or systematic) from the determination of F p4s so that we 
can make a cleaner comparison with the extrapolations over the full data set (nonphysical 
quark masses included) in the next section. 


C. Full continuum extrapolation 

Using all of the ensembles listed in Tables I and II, we now perform a combined contin¬ 
uum extrapolation and interpolation to physical quark masses. Compared with the simple 
continuum extrapolation over the physical quark-mass ensembles only, the full approach has 
greater statistics, provides a handle for precise tuning of the light-quark and strange-quark 
masses to their physical values, and allows for better control and analysis of the systematic 
errors from discretization effects. 

We break the analysis into two main sections. First, the functional forms and parameter 
variations for controlling mass and lattice-spacing dependence are outlined. Second, we 
present the results from our fits of the lattice data to the models from the first section. 






1. Models of mass and lattice-spacing dependence 


To perform the combined continuum extrapolation/quark-mass interpolation there are 
three functional forms that must be chosen: quark-mass terms, lattice-spacing terms, and 
terms that combine both (cross terms). 

For the light and strange mass dependence we use the chiral expansion outlined in 
Sec. IIB 1 with M n and M K as independent variables, standing in for the light- and strange- 
quark-mass dependence. For each fit we include the expansion up to LO (just a constant), 
NLO (which adds an analytic term linear in the squared meson masses, but no chiral log¬ 
arithms), or NNLO (chiral logarithms and terms up to quadratic in the squared meson 
masses). In the fits to Eq. (6), the rho meson mass is used for //, f n is used for / at NLO, 
and F pis is used for / at NNLO for convenience; other choices for these quantities would be 
equivalent up to redefinitions of fcj and the addition of terms of higher than NNLO order. 

For the NLO charm-quark-mass dependence, Ref. [23] argues that, for large m c , cor¬ 
rections start at 0(1/ml). In our central fits we therefore add a term proportional to 
1/m' 2 — 1/m 2 (primes denote simulation values, unprimed quantities denote physical val¬ 
ues). However, in the lighter-than-charm region where Ref. [23] performed simulations, their 
data actually was better described by l/m c dependence than by 1/m 2 . Although all our 
values of m! c are closer to m c than those of Ref. [23], we also consider fits that replace 
l/m/ c 2 — 1/m 2 by 1 /m' c — l/m c in estimating systematic errors. 

For the lattice-spacing dependence we use a Ta.ylor-series ansatz in powers of a 2 , a s a 2 , 
and a 2 a 2 . We include powers of a s because the leading errors coming from the action in 
a joint expansion in a 2 and a s is a s a 2 and the leading taste-violating errors is a 2 a 2 . For 
the original scales i/fo,ori g and u>o,orig, the first order term in lattice spacing, a 2 , is always 
included. Higher orders are optionally included up to a 6 , a s a 2 , and a 2 a 2 . For the improved 
scales -y/fcqmp and Wo.imp, the first order in either a s a 2 or a 4 is always included. Higher 
orders are optionally included up to a 8 , ( a s a 2 ) 2 , and a 2 a 2 . Even though the scales y/t 0imp 
and tco.imp are improved to order a 8 at tree level, the a 4 through a 8 terms are included for 
fits because aF p4s has leading corrections of a s a 2 and a 4 . For both scales, the number of 
lattice-spacing terms in a single fit is not allowed to exceed 3. Together with the value of 
the scale in the continuum limit, this ensures that at most four parameters describe the a 
dependence of the data from our four unique lattice spacings. 
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In order to limit the large number of cross terms possible, we only include products of 
chiral and lattice-spacing terms whose total “order” is no higher than the largest noncross 
term included in the fit function. Here by order we simply mean the total power of any 
of the following factors, which all have similar magnitudes for the HISQ ensembles: a s ~ 
(AqcdcO 2 rs./ (M/(47r/)) 2 . Also, no cross terms are constructed from the highest orders of 
mass or lattice-spacing terms. For example, a fit including a 6 and the chiral expansion to 
NNLO would include a term like a 4 (M / (An f)) 2 but not a 2 (M/(47r/)) 4 . 

Once the functional form is chosen, we also consider various restrictions of the data set. 
As already suggested from the naive fit to the physical quark-mass ensembles only, the 
a ~ 0.15 fm ensembles may require higher orders of a 2 to be included. So we consider fits 
that include or drop these ensembles. Furthermore, when the a ~ 0.15 fin ensembles are 
dropped, we do not include more than two lattice spacing terms to ensure the three unique 
lattice spacings represented by the data set are parametrized by three or fewer variables. A 
second restriction on the data set is determined by the kaon mass. The lighter-than-physical 
strange-quark ensembles have strange-quark masses all the way down to 1/10 the physical 
strange-quark mass. Including these ensembles along with the physical-mass ensembles that 
comprise most of out data requires more complex chiral forms to cover the large range in 
m' s . We therefore consider eight different lower bounds for the kaon masses included in the 
fit, ranging from just below the physical strange-quark mass, to near zero, which includes 
all the ensembles. We do not set an upper bound for the kaon mass, as would be typical 
of chiral-perturbation-theory extrapolations, because this would only leave a ~ 0.12 fm 
ensembles for the extrapolation. 

We add Gaussian priors centered around zero to ensure the magnitudes of fit parameters 
are physically plausible; we refer to the standard deviation of the Gaussian as the prior 
width. For discretization terms of the form fc(a/?a 2 AQ CD ) n , the dimensionless coefficient k 
is presumed to be of order unity so that the finite Tayler-series expansion in a 2 , a s a 2 , and 
a 2 a 2 is justified. When reexpressed in terms of the two dimensionless quantities, 

Y , = y = hAi (19) 

*- ud 3^2 J2’ g n 2 J2’ V > 

the coefficients of the terms from chiral-perturbation theory are also expected to be of order 
unity. 2 A prior width of 1 in these units is in most cases sufficient to ensure that the data, 

2 Prior widths on cross terms are set to the product of the widths associated with each of the factors. 
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rather than the prior assumption, is constraining a given parameter, since the deviation of 
the parameter from zero is well within one prior width. Once the priors widths are increased 
to 3, this is true for all the discretization and chiral parameters, and most fit results are 
negligibly different from those with no prior constraints at all. The only exceptions are 
seven fits to ^/to,im P -^p 4 s whose continuum results differ by ~ 2cr stat from those without prior 
constraints. 

For the NLO charm-quark-mass corrections, the prior width is based on the results 
of Ref. [23], which finds that such heavy-quark effects on the gradient-flow scales are 
<0.3%. For a dimensionless ratio R(m c ), we choose the prior width such that | R(m c ) — 
R(oo)\/R(oo) < 0.5% or 1.5%. Most fits show negligible difference between a prior width of 
1.5% and a prior width set to infinity. However, the prior width does significantly constrain 
the m c dependence on a few outlying fits; without any prior constraints these fits would 
have shown differences of 4% to 5% between a physical m c and an infinite m c . We consider 
such a large m c dependence unreasonable and we believe we are justified in removing these 
few outliers using the prior constraints. It is probable that these large NLO l/m c power 
corrections are mimicking the dependence on other variables such as m s \ we note that 
the mistunings in m c are comparable to and correlated with the mistunings in m s on the 
physical strange-quark-mass ensembles. Another possibility is that the power corrections 
are making up for errors in the derivatives OF 2 jdm c and dM 2 / dm c . We have checked to 
see, however, that varying the derivatives by 2<r sta t does not produce significant variations 
in the continuum results. For this reason and others discussed in later sections, we do not 
widen the prior on NLO charm-quark-mass dependence any further than 1.5% in the final 
analysis. 

This leads us to consider two sets of Gaussian priors in our final analysis: one set with 
all prior widths set to the smaller choice (1 for discretization and chiral terms, and 0.5% for 
NLO charm-mass dependence) and another set with all widths set to the larger choice (3 for 
discretization and chiral terms, and 1.5% for NLO charm-mass dependence). Both sets of 
priors are in general wide enough that the parameters are determined by the data and not 
the priors (the deviation of the parameter from zero does not change an appreciable fraction 
of the original width when the width is increased by a factor of 3); the only exceptions 
are for the parameters determining NLO charm-mass dependence, and then only for a few 
outlying fits, as described earlier. 
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For all scales, there are three chiral expansions, eight choices of lower bound for the kaon 
mass, two choices for the next-to-leading-order charm-quark-mass correction, and two sets of 
priors. For the original scales -y/fo.orig and w 0 , O rig, there are six lattice-spacing expansions with 
the a ~ 0.15fm ensembles included and three lattice-spacing expansions with the a ~ 0.15 
fin ensembles not included. This produces a total of3x8x2x2x(6 + 3) = 864 different fits. 
For the improved scales i/fo,imp an d Wo,imp> there are nine lattice-spacing expansions with 
the a ~ 0.15fm ensembles included and five lattice-spacing expansions with the a ~ 0.15 fm 
ensembles not included. This produces a total of3x8x2x2x(9 + 5) = 1344 different fits. 


2. Fits to the lattice data 

We gauge the acceptability of each of the fits outlined in Sec. Ill C 1 using the p value of 
the fit. We also consider the number of degrees of freedom for each fit and the proximity 
of the fit curve to the data from our most important ensemble, the one with physical quark 
masses and a ~ 0.06 fm. This extra information is not used to restrict the set of fits, but 
allows us to better visualize their properties. Figure 6 shows the acceptability for the original 
and improved scales with the p value as the x axis, deviation from the physical a ~ 0.06 fm 
ensemble as the y axis, and the size (radius) of each data point proportional to the number 
of degrees of freedom. We define “acceptable” fits as those with p > 0.01. Acceptable fits 
are those to the right of the black line in Fig. 6. Note that, for all the scales considered, fits 
with acceptable p values are usually close to the result from the a ~ 0.06 fm physical-mass 
ensemble. For all the gradient-flow scales, no acceptable fit deviates from that result by 
more than 2cr stat . 

To determine a central value and systematic error from the choice of fit we construct 
histograms in Fig. 7 of the continuum results from fits with p > 0.01. A histogram method 
to determine systematic errors has been used previously by the BMW Collaboration [26] . A 
key distinction is that we do not treat the distribution as a kind of probability distribution, 
but simply treat all acceptable fits as realistic alternatives and take the largest positive 
and negative differences from the central fit as the systematic errors. For y/t o where the 
tree-level improvement produces a clear reduction in discretization errors (as discussed later 
in this section), we use only the histogram of the improved scales in the estimate 

of the systematic error. In other words, we use the full range of the fits shown in green in 
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FIG. 6. The “acceptability” for the various fits considered for the to scales (^/to,orig and \/io,impj 
top panel) and ico scales (wo.orig and too,imp; bottom panel). Fit acceptability is determined by 
the p value (x axis) and further illustrated by the proximity to the results from the physical-mass 
a ~ 0.06 fm ensemble in units of <J s tat (y axis). The size of the points is proportional to the number 
of degrees of freedom. The region to the right of the black line contains fits with 0.01 < p < 1.0 
and a deviation of less than 2er stat . This line determines the acceptable subset of fits considered in 
the subsequent analysis. The central fit chosen from this analysis is denoted by the star. 
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FIG. 7. Histograms of the continuum extrapolations for y/toF p 4 s (top panel) and wqF p ^ s (bottom 
panel) for all acceptable fits (see the text). Each histogram is a stacked combination of continuum 
extrapolations from the original (^/fo.orig and u?o,orig) and improved scales (^/to,imp and u’o.imp); 
represented by the red, hashed and green, solid bars, respectively. The box and error bars along 
the bottom denote the minimum, mean, maximum, and central 68% of the distribution. The 
vertical dashed line for each distribution marks the continuum result of the associated central fit. 
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Fig. 7, but do not consider the red outliers at the left of the histogram. For Wo the tree-level 
improvement does not clearly reduce the size of discretization effects, so we include w 0 as 
well as Wo,imp hi the systematic error estimate. Widening the prior on the NLO charm-mass 
dependence makes a noticeable shift in the continuum values of some of the outlying fits on 
both histograms, ffowever, widening the prior does not significantly change the continuum 
values for the central bulk of the histogram. For this reason, and because the widest prior 
width for NLO charm-mass corrections included in the histogram is large (~ 1.5%, more 
than three times what is estimated in Ref. [23]), we do not widen the prior further. 

For both y^ and Wo, the central fit is chosen by locating fits close to the median and 
mean with p > 0.1. If there are several fits that satisfy this criterion, fits with a larger 
number of degrees of freedom are chosen. For yfto, where there are very few fits with 
p > 0.1, this criterion is sufficient to determine the central fit. For Wo, there are a large 
number of fits satisfying this criterion; thus, we narrow the choice down by preferring fits 
with a s a 2 over a 4 , 1/m], 2 — 1/m 2 over 1/m], — l/m c , and the NNLO chiral expansion over 
the NLO expansion. The central fits to y/t o and Wo are both to the improved scales, include 
all but the three lightest m' s ensembles, use the 1/m], 2 — 1/m 2 NLO correction in m c , include 
the a s a 2 lattice-spacing term, exclude the coarsest a ~ 0.15 fm ensembles, and use the 
wider set of priors. The central fit for y/% only uses the chiral expansion to NLO and 
adds in the a 4 lattice-spacing term, resulting in five free parameters (with four priors— 
continuum values are never constrained by priors) and 14 data points. The central fit for 
w 0 includes the full NNLO chiral expansion but does not add in additional lattice-spacing 
terms, resulting in seven free parameters (with six priors) and 14 data points. For y/to, the 
central fit has y 2 /d.o.f = 14.0/9 and p = 0.14 from the data alone (he., with the standard 
or “unaugmented” definition of y 2 coming from data, and degrees of freedom equal to the 
number of data points minus the number of fit parameters). Including contributions from 
priors, the “augmented” y 2 /d.o.f = 15.2/13 and p = 0.31. The fit is 0.07cr higher than the 
result on the physical a ~ 0.06 fm ensemble. For wq, the central fit has y 2 /d.o.f = 3.0/7, 
p = 0.89 unaugmented and y 2 /d.o.f = 4.0/13, p = 0.99 augmented, and is 0.15<r higher 
than the result on the physical a fa 0.06 fm ensemble. The central fits are shown in Fig. 8. 
The dashed lines indicate how well the fit describes the data by showing the fit function 
evaluated at the same masses and lattice spacing as the data points. The three solid bands 
show the lattice-spacing dependence at fixed quark masses, tuned to a physical value for the 
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strange-quark mass and the indicated ratio of the light-quark to strange-quark mass. One 
clearly sees the effects of retuning the quark masses from their simulation values. 

Because there are a wide range of choices for the cental fit to wo, we include a description 
of an alternative fit, which is plotted in Fig. 9. The fit is similar to the central fit for 
wo previously mentioned, except it includes the lattice-spacing terms a 4 and a 2 a 2 and the 
coarsest ensembles at a ~ 0.15 fm. The fit has x 2 /d.o.f = 7.7/8, p = 0.47 unaugmented and 
X 2 /d.o.f = 9.64/16, p = 0.89 augmented, and is 0.18cr higher than the physical a ~ 0.06 fm 
ensemble. The addition of more lattice-spacing terms and the coarsest ensembles leads to 
a hook in the continuum extrapolation near a ~ 0.06 fm, which significantly increases the 
statistical error in the continuum result. For this reason we prefer the previously mentioned 
central fit for wq over this alternative. 

For the fit to \Ao,impFp 4 . s , the lattice-spacing dependence at finer lattice spacings (a < 0.09 
fm) is dominated by the a s a 2 contribution. The a 4 contributions start to become comparable 
to those from a s a 2 for a >0.12 and produce the curvature evident in Fig. 8 (top panel). 
The chosen central fit to Wo,im P T/ 4 S has only one lattice-spacing dependent term, a s a 2 , but 
it excludes the coarsest a ~ 0.15 fm ensembles. So, it is also worthwhile to examine the 
alternative fit’s lattice-spacing dependence since the a ~ 0.15 fm ensembles are included in 
this fit. The lattice-spacing dependence of Wop m pF p4:S in the alternative fit is milder than 
for v^o,imp-^p4s, but also more complicated. The majority contribution for all a is from 
a 2 a 2 , but only by a slim margin; the a 2 a 2 term is approximately 53% — 57% of the sum 
of the absolute value of all discretization terms. The contributions from a s a 2 and a 4 have 
comparable magnitudes and add together, canceling some of the contribution from a 2 a 2 . 
The cancellations are larger for a < 0.06 fm and a > 0.12 fm, causing the curvature seen 
in Fig. 9. Because the central fit to a/£o, impels and the alternative fit to w 0) i m pF p4:S have 
multiple discretization terms of comparable magnitudes for a > 0.12fm, we ensure higher- 
order terms are negligible by repeating these fits with the addition of the next highest terms 
in a 2 or a s a 2 . The continuum results for these modified fits do not significantly differ from 
the original fits. 

It is revealing to examine the central extrapolations plotted through only the physical- 
mass ensembles for all four gradient-flow scales, as was done in Fig. 5 for the naive extrapola¬ 
tion. This plot is presented in Fig. 10. Compared to the simpler fits to just the physical-mass 
ensembles in Sec. Ill B 4, quark-mass mistunings in the physical quark-mass ensembles are 
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FIG. 8. The central fits to the gradient-flow scales • v /to,imp-^p 4 s and wo,impTp4s> plotted as a 
function of a s a 2 . These are used to compute v^o (top panel) and wq (bottom panel) at physical 
quark masses and in the continuum, as indicated by the black stars. Only m! s ~ m s ensembles are 
plotted, but the fits include 0.25m s < m' s < m s ensembles. Dashed lines represent the fit through 
each ensemble’s actual quark masses and lattice spacing, while the solid bands are for varying 
lattice spacing at fixed quark masses retuned to the physical strange-quark mass and the ratio of 
m'Jm'g specified in the legend. 37 
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FIG. 9. An alternative fit to the gradient-flow scale uio,imp-F p 4 S , plotted as a function of a s a 2 . 
The value of wq at physical quark masses and in the continuum estimated from this fit is indicated 
by the black star. Only m' s ~ m s ensembles are plotted, but the fit includes 0.25m s < m' s < m s 
ensembles. Dashed lines represent the fit through each ensemble’s actual quark masses and lattice 
spacing, while the solid bands are for varying lattice spacing at fixed quark masses retuned to the 
physical strange-quark mass and the ratio of m!,/m' s specified in the legend. 

accounted for here. This leads the two coarsest physical-mass ensembles (a = 0.12 and 
a = 0.15 frn) to shift down when retuned to the precise ratio mi/m s = 1/27. For the fits to 
\Ao,origFp 4 s and ^tQ pmp F p4s the difference is visible but has only a small effect on the con¬ 
tinuum extrapolation. For the fits to u>o,origins and Wo^ mp F p4s the shift is very important, 
as the fluctuation in the data points across the range of a 2 is comparable to the size of the 
effect of the mass retuning. 

For both and Wq, the tree-level improved version of each scale eliminates a 2 errors 
(but not a s a 2 ) and reduces a 4 and a 6 contributions. The improvement in y %F p4:S is obvious 
in Fig. 10. For Wo, even after quark-mass retuning, the size of discretization effects for 
a < 0.06 fin in u’o,imp-F P 4 S is at best marginally smaller than in Wo j0r i g F P 4 S . Although numerical 
results cannot separate the effects of F p4s from w 0 , the lack of clear improvement between 
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FIG. 10. Continuum extrapolations for the original (-y/fo^rig and wo,orig) and improved (-y/topmp 
and wo, imp) gradient-flow scale times F p ^ s plotted for physical quark-mass ensembles only. All fits 
to the original, unimproved scales include the chiral expansion to NNLO, l/m 2 NLO charm-quark 
corrections, the wider set of priors, all four lattice spacings, and all but the three lightest m' s 
ensembles. For - v /to,orig^ ? p 4 s the fit is quadratic in a 2 . For Wo, 0 rig-Fp 4 s the fit is linear in a 2 and 
a s a 2 . For the improved scales the plotted lines are from the central fits discussed in this section. 
The continuum-extrapolation points are shown in black with error bars representing the statistical 
error only. 


wo and wo,i mp suggests that the dominant lattice artifacts in wq may not arise at tree level. 
Alternatively, the lattice artifacts from F p 4 S may be dominating the continuum extrapolation, 
making it difficult to resolve the differences between w 0 , O rig and w’ojmp- 
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IV. RESULTS 


A. Scales in physical units 

We compute our final estimate of the gradient-flow scales in physical units by evaluating 
the continuum-extrapolated, physical-quark-mass-interpolated value of y/toF p ± s and WoF p ^ s 
for the best fit in Sec. Ill C 2 and dividing by the physical value of F p ^ s (see Sec. III). 

Vto — 0.1416(2) s t a t ( ^3 ) j 0 rap ( ^ 2 ) F v n s , extrap ( 2 ) FV ( 3 ) /,, PDG f m (20) 

Wq = 0.1714(2) stat (+u)w o , e xt ra p(-2)F p 4 s ,extrap(2)Fv(3)f v PBG f m (21) 

The first error is statistical and is from the corresponding central fit discussed in 
Sec. Ill C 2. The remaining, systematic, errors are from continuum extrapolation/chiral 
interpolation (estimated by variations among fits), corresponding continuum and chiral er¬ 
rors on F pis in physical units, residual finite-volume effects on F p 4s , and the error in F pis 
from the experimental error in [27], respectively. The error from the choice of fit for the 
gradient-flow scale is estimated using the histograms in Fig. 7. We use the full range of fits 
to fo,imp-Pp 4 s for to and the full range of all fits for wq. The remaining extrapolation errors, 
residual finite-volume effects, and error from the experimental value of f n come directly 
from the analysis of F pis [15]. 

The results in Eqs. (20) and (21) may be compared to the earlier, simple estimates 
of y/to = 0 . 1419 ( 1 )( 1 1 4 ) fm and w 0 = 0.1710(4)(1^) fm from the physical quark-mass 
ensembles in Sec. IIIB4. For both y/% and wq, the extrapolated values agree, within the 
earlier systematic errors. [Note that the earlier result did not include the uncertainties 
from F p 4 S and f n , which give the last three errors in Eqs. (20) and (21).] For y/to, the 
central value from the simpler fit is slightly higher and both extrapolations lead to similar 
statistical uncertainties. The main improvement of extrapolating ^o,imp-fp 4 s over the full 
set of ensembles is the narrower systematic uncertainty in the continuum, physical-mass 
extrapolation. For wq, the central value from the simpler fit is slightly lower. This shift is 
attributable to the quark-mass retuning and higher-order discretization terms only accessible 
to the full extrapolation. This additional systematic control leads us to prefer this full 
analysis over the simple one, even though the total errors for WqF tAs are slightly larger in 
the full analysis. Overall, the addition of nonphysical quark-mass ensembles reduces some 
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uncertainties and improves control over the continuum extrapolation without significantly 
deviating from our initial estimate. 

The results presented in this work have evolved from preliminary results presented 
previously. In chronological order, the estimates from two earlier proceedings are Wq = 
0.1711(2)(8) fin in Ref. [16], and v^o = 0.1422(2)(5) fm and w 0 = 0.1732(4)(8) fm in 
Ref. [17]. For comparison to the results in this work, we have altered the original results 
by keeping only the statistical and systematic error from the choice of fit form to \t%F P 4 S 
or w 0 F p 4s . We have dropped all other systematic errors, which are shared across all results. 
For both scales, all results agree within 2cr of the current results. Compared to the result 
in Ref. [16], those in Ref. [17] account for leading-order charm-quark-mass mistunings, use 
aF P 4 S , instead of af n , to set the scale, and consider a larger selection of discretization terms. 
However Ref. [17] uses an incorrect value of am c for the physical quark mass, a ~ 0.06 
fm ensemble when adjusting for charm-quark-mass mistunings. The mistake is fixed in the 
current work and is responsible for most of the downward shift relative to the scales pre¬ 
sented in Ref. [17]. We have also updated the statistical errors from ciF p and now include 
the induced correlations from aF.\ ps among ensembles at the same (3. Finally, compared to 
Ref. [17], the current work incorporates the tree-level improved versions of each scale, re¬ 
fines the selection of discretization terms, includes next-to-leading-order charm-quark-mass 
corrections, and uses priors to constrain the fit parameters. 


B. Continuum meson-mass dependence 

Using the best fits and data sets chosen in Sec. Ill C 2, we determine the continuum 
meson-mass dependence of Wq under a mass-independent scale-setting scheme. The resulting 
function is useful for a prediction of the scales on future ensembles, as well as for explicit 
comparison of the mass dependence of w 0 to that of other scale-setting quantities. To predict 
a scale one measures w 0tOrig /a (or w 0pmp /a), aM n , and aM K on a subset of the ensemble to 
be generated. Then, by evaluating the function at the corresponding dimensionless variables 
P = ( WoM n ) 2 and K = ( WqMk) 2 one can determine the continuum value of wq in physical 
units at those masses, wo(P,K), and compute the resulting scale a = wo(P, K) /(iu 0 / a). 
This procedure was originally suggested in Ref. [5]. 

The functional form of the meson-mass dependence Wq(P,K) is chosen to be the same 
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as the chiral expansion to NNLO, in agreement with the best fit chosen in Sec. Ill C 2. The 
coefficients are determined by solving the implicit equation 

w 0 = f(P = ( w 0 M n ) 2 , I< = ( w 0 M k ) 2 ) (22) 

numerically for w 0 = w 0 (P,K). Using the best fit h(a, (M n /F p4s ) 2 , (M K /F p4s ) 2 ) = w 0 F p4s 
of Sec. Ill C 2, the implicit function is defined as 

wo(P,K) = h(0, P/(w 0 F p4s ) 2 , K/{w 0 F p4s f)/F p4s , (23) 

where F p4s is evaluated at physical quark masses and in the continuum. Note, the first 
parameter is set to 0, denoting the continuum limit. We fold 

w 0 (P, K) = 0.1809 - 0.0055 (2 K + P) + 0.0766 Pp P + 0.0948 K/j k (24) 

- 0.0018 (P - 4 K)^ + 0.0237 t] fi v - 0.0363 (2 K + P) 2 + 0.0063 (K - P) 2 

where fi z = zlog(z/A), with A = (M p /V&' kF p4s ) 2 ~ 0.3170, and r) = (4 K — P)/3. The 
fractional error in vj 0 (P, K) is approximately the same as for our continuum determination 
of Wo at physical masses, given in Eq. (21). Figure 11 plots this function over a large range 
of values of P and K. Values corresponding to the HISQ ensembles and to the physical-mass 
point are overlaid to give a sense of the range of meson masses for which this function is 
valid. The leading (2 K + P) dependence can be observed in the roughly linear shape for 
each line of constant K and the approximately constant vertical gap between lines of fixed 
K , independent of P. The separation of points within the clusters of physical strange-quark 
mass ensembles that were simulated close to the physical ratios mi/m s = 1/5,1/10, and 
1/27 is due to quark-mass mistunings and discretization errors. 

Using Eq. (24) and the results for wo,im P /a on the HISQ ensembles, we recalculate a(frn) 
for each ensemble and check to see that the results are consistent with the original lattice 
spacings set through F p 4s . Table VIII lists the lattice spacings determined through F p4s in 
Ref. [15] and w 0 in this work. The scales determined from Wo.imp are almost independent of 
quark masses for fixed /3, showing that the procedure is working as designed, and can be used 
to find consistent scales of new ensembles, even if they do not have physical quark masses. 
Lattice spacings determined from F p4s and tfo,im P on the physical quark-mass ensembles 
agree as the continuum limit is approached, and are close over the whole range of lattice 
spacings. This fitting procedure may be repeated to find y/to as a function of P' = (a/ f^M^) 2 
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FIG. 11. The continuum mass dependence of y/to and wq as a function of P = ( woM n ) 2 for fixed 
values of K = (wqMk ) 2 ■ The black points and the star illustrate the values of the pion and kaon 
masses that correspond to various HISQ ensembles and to the physical point, respectively. The 
three boxes on the plot of wq enclose the physical strange mass ensembles with different ratios of 
m'i/m' s . From the left- to the rightmost box the ratios are m!Jm! s = 1/27, 1/10, and 1/5. Similar 
boxes are not drawn for due to the large discretization effects separating the points that would 
go in each box. The lines corresponding to K = 0.1iv p h ys do not extend over the full domain of P 
because we restrict (wqM v ) 2 « (4 K 2 — P 2 )/3 




































and K' = ( y/toMu ) 2 - Because the central fit for y/toF p4:S does not include NNLO terms from 
chiral-perturbation theory, we redo the fit with NNLO terms added. We find 

Vto(P,K) = 0.1455 + 0.0007(2/1 + P) + 0.0994 Pfi P + 0.1336 K^k (25) 

+ 0.0002 (P - 4 K)n v + 0.0334 // fi v - 0.0214 (2 K + P) 2 - 0.0040 (K - P) 2 , 

where the notation and error determination are the same as for Wo, and the fractional error of 
y / to(P, K) is approximately that of t 0 in Eq. (20). The corresponding mass-dependence and 
lattice-spacing estimates are shown in Fig. 11 and Table VIII. As might be expected from 
the large slope seen for y Pq in Fig. 5, the lattice-spacing estimates show large discretization 
effects for the coarser ensembles. 

V. DISCUSSION AND CONCLUSIONS 

With the continuum results complete, we compare with computations of gradient-flow 
scales performed by other collaborations. Table IX shows a selection of those calculations and 
their final results in comparison with our own. The same results are also plotted in Fig. 12. 
Differences are shown divided by the joint error, except for the HPQCD Collaboration 
data. Because HPQCD uses a subset of the HISQ ensembles employed here, we do not 
use the joint sigma, which would double count several sources of error; instead, we use the 
larger of the two collaborations’ total error. Our results for both scales are compatible with 
those of the three other published continuum-limit calculations by HPQCD, HotQCD, and 
BMW; the largest difference is 1.9cr. Our best agreement is with HPQCD, the latter of 
which performed an independent analysis on the same HISQ configurations but without the 
a = 0.06 fm ensembles. We also agree with the published, singlc-lattice-spacing result for 
a= 0.1414(7) (5) fm from TWQCD [28]. Furthermore, we agree within 2cr with all but one 
collaboration’s preliminary results: y/% and Wq calculated by the ALPHA Collaboration with 
Nf = 2. This may be due to the difference in the number of flavors: Reference [29] has found 
stronger Nf dependence for y/to than for wq, which is consistent with the observed deviations 
between the ALPHA Collaboration’s preliminary results and those of this paper [29]. 

Finally, we compare the relative lattice scale found from y/t 0tOTig , u^orig, and other quan¬ 
tities used for scale setting. Here, we assume that the scale setting is being performed on 
ensembles with physical quark masses, so that extrapolation in quark mass is not required 
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TABLE VIII. Values of the lattice spacing determined from aE p 4 S [15], tco,imp/a, and \/to,imp/a 
on the physical strange-quark HISQ ensembles listed in Table I. The first two columns list the 
coupling (3 and the ratio of light- to strange-sea-quark masses, with the lattice dimensions appended 
as needed to uniquely identify each ensemble. Since we do not have a function corresponding to 
Eqs. (24) and (25) for F p 4 S , a mass-independent scale setting with F p 4 S is performed on the physical 
quark-mass ensembles only. The error listed with each estimate of a from the gradient-flow scales 
comes from the full error of wq(P,K) or A'). Errors from quark-mass mistunings are not 

included; in other words we are finding the scale at the actual simulation values of the quark 
masses, rather than at the intended values. 


p 

m't/m's 

[o-Fp/^s) / 

w 0 /(w 0t imp/a) (fm) 

Vto/ (v^o, imp/a) (fm) 

5.80 

1/5 


0.1511® 

0.1410® 

5.80 

1/10 


0.1511C!?) 

0.1413(1 jj) 

5.80 

1/27 

0.15305(±|X) 

0.1509(® 

0.1413(1 jj) 

6.00 

1/5 


0.1206(® 

0.1162(l 1 g) 

6.00 

1/10 (32 3 x 64) 


0.1206(l 1 g) 

0.1163(1®) 

6.00 

1/10 (40 3 x 64) 


0.1207(l 1 g) 

0.1163(1®) 

6.00 

1/27 

0 . 12232 (^ 33 ) 

0.1206(l 1 g) 

0.1163(1*) 

6.30 

1/5 


0.0873(l 1 g) 

0.0855(1®) 

6.30 

1/10 


0.0874(1®) 

0.0857(1®) 

6.30 

1/27 

0.08791(^24) 

0.0875(1®) 

0.0858(1|) 

6.72 

1/5 


0.0566(lg) 

0.0561(1®) 

6.72 

1/10 


0.0565(1®) 

0.056®) 

6.72 

1/27 

0.05672(^1) 

0.0566(1|) 

0.0562(1 3 ) 


for any quantity. In that case, the systematic errors associated with extracting any of these 
scales on a given, physical-mass ensemble are generally significantly smaller than the sta¬ 
tistical errors, with the possible exception of rl/a at finer lattice spacings, for which errors 
in extracting asymptotic energies may become significant. Table X compares the percent 
statistical error for various scale-setting quantities in lattice units measured on the HISQ 
physical quark-mass ensembles. Both gradient-flow scales are determined more precisely 
than ri/a and af v . The precision of y/t 0tOI i g /a is higher than, and the precision of w 0 ^ orig /a 
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TABLE IX. Continuum results for the gradient-flow scales y/to and wo from different collaborations 
[5, 29-33]. The last two columns tabulate the difference between the results of other collaborations 
and this work, relative to one joint sigma. For HPQCD, whose errors are not independent of ours, 
we simply use the larger error for the comparison. Results of the three collaborations marked with 
an asterisk are the preliminary conference results. 


Collaboration 

N f 

Vt~ 0 (fm) 

Ay/to/a 

w 0 (fm) 

Awo/a 

MILC [This work] 

2+1+1 

0.1416(1)(±|) 


0.1714(2)0 


HPQCD [30] 

2+1+1 

0.1420(8) 

+0.4 

0.1715(9) 

+0.1 

ETMC* [31] 

2+1+1 



0.1782 


HotQCD [33] 

2+1 



0.1749(14) 

+1.8 

BMW [5] 

2+1 

0.1465(21)(13) 

+1.9 

0.1755(18)(04) 

+1.7 

QCDSF-UKQCD* [32] 

2+1 

0.153(7) 

+1.6 

0.179(6) 

+1.2 

ALPHA* [29] 

2 

0.1535(12) 

+8.3 

0.1757(13) 

+2.2 


is on par with, the most precise of the other scales, aF p4s . This small statistical error was an 
original motivation for computing the scale from gradient flow. Note further that y/t 0 orig /a 
and tco,orig/a have only been determined on a small subset of the configurations at finer 
lattice spacings, while the aF p ^ s values come from the entire ensembles, so there is con¬ 
siderable room for improvement for the gradient-flow scales. In addition, lower systematic 
errors—in particular, low dependence on quark masses—may make the gradient-flow scales 
preferable to aF pis for relative scale setting, especially when scales are needed for ensem¬ 
bles with unphysical quark masses or with significant quark-mass tuning errors. Statistical 
errors for r«o,orig /a are larger than those of y/to^ng/a. This is one factor, although not the 
dominant factor, that leads to our slightly more precise continuum extrapolated value for 
y/to compared to wq. On the other hand, Fig. 10 illustrates that the discretization effects 
for iCo,orig are much smaller than those for ^/t 0 ,orig when compared with the reference scale 
aF p4:S . It is conceivable that the small slope for -u; 0]Orig and u+imp is due to an accidental 
cancellation between their discretization errors and those of F p4s . However, when combined 
with the empirical evidence given in Ref. [5], it appears more likely that Wo has “intrinsi¬ 
cally” smaller a 2 dependence than y/% in the sense that the ratio of Wq to most common 
reference scales will have smaller discretization errors than the corresponding ratio for y/tQ. 
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FIG. 12. The continuum values of y/to and vjq separated by collaboration and grouped by the 
number of flavors. References for each collaboration’s work can be found in Table IX. Those 
results for collaborations marked with an asterisk are preliminary. Our results for yTo and wq 
are consistent within 2 standard deviations to all other results except the preliminary calculations 
from the ALPHA Collaboration. 

Finally, we remark that the small error of aF p4s , in comparison with that of af n , is what 
motivates us to use aF p4s for our continuum extrapolations of the gradient-flow scales, as 
discussed in Sec. IIIB4. 

In conclusion, we have computed the continuum, physical-mass values of y/to and wo, 
and find a /to = 0.1416(1|) fm and u>o = 0.1714(1^) fm, in reasonable agreement with 
most independent calculations, and in excellent agreement with the results of HPQCD, 
who used a subset of the same HISQ ensembles employed here. We have estimated the 
integrated autocorrelation lengths at different lattice spacings and found autocorrelation 
lengths comparable to that of the topological charge, although the errors at the finer lattice 
spacing (a ~ 0.09fm) are quite large. Compared to our preliminary work, the continuum 
extrapolation here is better controlled through the removal of tree-level discretization errors, 
the use of aF p4s over af n to set the scale, and the use of priors to suppress outlying fits 




140 0.145 0.150 0.155 oTl 6 $' 0.170 0.175 0.180 0.185 
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QCDSF-UKQCD* '14 
BMW '12 
HotQCD '14 

ETMC* '12 
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This Work '15 
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TABLE X. Percent statistical error for several scale-setting quantities including n, f n , F p 4 S , and 
the gradient-flow scales \Ao,orig/a and u^orig/® on the physical quark-mass HISQ ensembles listed 
in Tables I and IV. The statistical errors in the improved scales y / ?oJmp an d ,( A),imp are comparable 
to the original gradient-flow scales. The first column is the approximate lattice spacing and can 
be used to uniquely identify each ensemble. 


ss a(fm) 


Statistical Error (%) 


ri/a 

r 

O'FpAs 

y/to/d 

w 0 /a 

0.15 

0.36 

0.11 

0.06 

0.02 

0.03 

0.12 

0.25 

0.08 

0.06 

0.04 

0.07 

0.09 

0.33 

0.09 

0.06 

0.03 

0.07 

0.06 

0.12 

0.11 

0.06 

0.03 

0.06 


that have unreasonable lattice-spacing or charm-mass dependence. Further, the quark-mass 
interpolation has been constrained using chiral-perturbation theory, and the effect of charm- 
mass mistunings have been taken into account up to next-to-leading order. Finally, we have 
calculated the continuum meson-mass dependence for use in future scale-setting applications. 
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